Gender versus sex in predicting outcomes of traumatic brain injury: a cohort study utilizing large administrative databases

Understanding the factors associated with elevated risks and adverse consequences of traumatic brain injury (TBI) is an integral part of developing preventive measures for TBI. Brain injury outcomes differ based on one’s sex (biological characteristics) and gender (social characteristics reflecting norms and relationships), however, whether it is sex or gender that drives differences in early (30-day) mortality and discharge location post-TBI is not well understood. In the absence of a gender variable in existing data, we developed a method for “measuring gender” in 276,812 residents of Ontario, Canada who entered the emergency department and acute care hospitals with a TBI diagnostic code between April 1st, 2002, and March 31st, 2020. We applied logistic regression to analyse differences in diagnostic codes between the sexes and to derive a gender score that reflected social dimensions. We used the derived gender score along with a sex variable to demonstrate how it can be used to separate the relationship between sex, gender and TBI outcomes after severe TBI. Sex had a significant effect on early mortality after severe TBI with a rate ratio (95% confidence interval (CI)) of 1.54 (1.24–1.91). Gender had a more significant effect than sex on discharge location. A person expressing more “woman-like” characteristics had lower odds of being discharged to rehabilitation versus home with odds ratio (95% CI) of 0.54 (0.32–0.88). The method we propose offers an opportunity to measure a gender effect independently of sex on TBI outcomes.

Table 1.Characteristics of datasets used in each analysis.TBI: traumatic brain injury; Dept.: Department; ADG: Johns Hopkins' Aggregated Diagnosis Groups; Q1 = 1st quartile; Q3 = 3rd quartile.Data given as median (Q1, Q3) for continuous variables or (%) for categorical variables.a Subjects with severe TBI who had recorded survival status at day 30 after the first TBI event.b Subjects who had a record of discharge location from acute care hospitals and non-missing ADG score.c Based on non-missing records.associated with sex (50 of them were associated with higher odds of being female, and 231 with higher odds of being male) in both datasets after Bonferroni correction and were further included in the multiple logistic regression model to define gender score.The descriptions of the top 10 ICD-10-CA diagnostic codes associated with being male (Table 2) and with being female (Table 3) are presented.The codes of the greatest separation of male from female persons with TBI expressed gender-based division of labour and gender-based violence, highlighting normative roles, relationships, and behaviours ascribed to male and female persons on the basis of biological sex.As such, we assigned "man-like' and "woman-like' titles on a gender score continuum, where 0 refers to strongest man-like and 1 to strongest woman-like characteristics.We derived these terms for convenience reflecting the methodology used.
The codes associated with a higher degree of being "man-like" than "woman-like" reflect distinct behavioral and social characteristics, such as occupations (fall from scaffolding for males), risk taking behaviours (motorcycle riding for males).Codes associated with a higher degree of being "woman-like" (Supplementary Table 5 for the full list of codes) included gender-related vulnerabilities (partner violence related codes for females).
The final logistic regression model that defined gender scores as a probability of being female included 281 ICD-10-CA codes.Figure 1 presents the distribution of gender scores in male and female sexes in the test dataset, which is heavier to the left tail (lower values) indicating that scores were skewed to defining "man-like" persons more than "woman-like"; this is possibly because approximately 80% of the included diagnostic codes showed higher odds for male.Additional analysis investigating the relationship between gender score and age showed that the overall pattern of distribution was similar among different age groups (Supplementary Fig. 5).This analysis included 4389 patients in the test dataset with a severe TBI who had a survival status recorded at day 30 after their first TBI event, of which 402 (9.2%) of them died within 30 days from their injury event.
Characteristics of the datasets are presented in Table 1 and Supplementary Table 4.
In TBI-related excess mortality prediction (Table 4), all models contained the following control variables: age, Aggregated Diagnosis Groups (ADG) score, rurality indicator, income quintile, and cause of injury as well as population mortality as an offset.Sex only model, i.e., Model 1, the rate ratio (RR) for sex was 1.54 (1.24-1.91).In Model 2, gender score only model, the RR for gender score was 2.02 (1.02-4.0).
To compare the models, we first look at Model 3. Sex was a significant predictor with p-val = 0.0007 (likelihood ratio test (LRT) = 11.50, df = 1) while gender score was not significant with p-val = 0.28 (LRT= 1.18, df = 1).This is very good evidence that sex is a better predictor than gender (i.e., sex only model is preferred over gender score only model in predicting TBI-related excess mortality) 15 .To directly quantify the strength of this evidence, Bayes factor (BF) was calculated from the LRT statistics in Model 3 where the BF for Model 1 (sex only) over Model 2 (gender score only) was 174, which is considered "very strong" evidence on the Kass-Raftery scale 16 .

Predicting discharge location
This analysis included 7343 persons in the test set who had a record of discharge location from acute care hospitals and non-missing ADG scores.The cohort's characteristics are presented in Table 1 and Supplementary Table 3.
In Table 5, all models were controlled for age, length of stay (LOS), ADG score, rurality, and income quintile.Table 4 shows that both sex and gender score were significant in their separate models with a p-val = 0.0224 (LRT = 13.10,df = 5) and p-val < 0.0001 (LRT = 38.92,df = 5), respectively.In sex only model (Model 1), the odds ratio (OR) for being discharged to LTC (versus home) for females versus males was 2.14 (1.09-4.22),making sex a significant predictor for this category.In gender score only model (Model 2), gender score was a significant predictor for two discharge locations, particularly, OR for "other" (versus home) for "woman-like" versus "manlike" was 0.34 (0.22-0.51) and rehab (versus home) was 0.54 (0.32-0.88).To further investigate the relationship between gender score and "other" discharge location, we fit a series of logistic regression models comparing chances of going to a particular location included in "other" as compared to "home" (Supplementary Table 6).It showed that the relationship observed in the main model could be driven by the largest subgroup within the "other" category, namely, those discharged to another hospital/acute care facility because more "woman-like" persons had a lower chance of being discharged to that location (versus home) as compared to more "man-like" persons with an OR of 0.22 (0.13-0.40).
To compare sex only model versus gender only model, we looked at Model 3 which contains both sex and gender effects.Gender score was significant with p < 0.0001 (LRT = 34.41,df = 5) while sex was not significant with p = 0.126 (LRT = 8.59, df = 5).This is very good evidence that gender score is the stronger predictor, i.e., gender score only model is preferred over sex only model, of acute care hospital discharge location 15 .To directly Table 4. Models predicting TBI-related mortality.CI: confidence interval; LRT: Likelihood-Ratio Test; df: degrees of freedom.Poisson survival models using binary sex and continuous gender score (1 is more womanlike vs. 0 is more man-like) for test set with N = 4389.

Model predictor(s) Rate ratio (95% CI) Likelihood ratio (df, LRT p-value)
Model 1 Sex (Female = 1) 1.54 (1.www.nature.com/scientificreports/quantify the strength of this evidence, BF was calculated from the LRT statistics in Model 3 where BF for the gender score only model over the sex only model was 4 × 10 5 which is considered as a "very strong" evidence on the Kass-Raftery scale 16 .

Discussion
We utilized ICD-10-CA diagnostic codes billed for patients with TBI during their hospital or emergency department (ED) visits to derive gender-related characteristics of male and female persons with TBI.We applied Lippa and Connelly's "gender diagnosticity" concept which refers to a "probability that an individual is predicted to be a male or a female based on some set of gender-related diagnostic indicators" 14 and showed how this score can help in distinguishing between sex and gender in study of the TBI outcomes.Prior research used the concept of gender diagnosticity to construct a gender score based on information derived from psychosocial variables and showed that gender score was associated with cardiovascular disease risk factors, independently of biological sex 17 .
To the best of our knowledge, this is the first large-scale population-based study using health administrative data that investigated sex and gender effects in TBI outcomes simultaneously.To achieve this goal, we constructed a gender score metric based on information from ICD-10-CA diagnostic codes recorded during TBI-related ED or acute care hospital visits, and then used this score along with biological sex to predict early mortality and discharge location.Biological sex and gender score characterized persons with TBI differently and had distinctive predictive effect for early mortality and acute care discharge location.There is evidence of very strong effect of sex and gender score in predicting early mortality and discharge location, respectively, based on the Kass-Raftery score criteria 18 .Therefore, it can be used to alert clinicians and policymakers to these distinctive effects, and to develop preventive and rehabilitation strategies.This study also provides researchers who have access to large administrative healthcare databases with a method to a derive gender score in their population of interest and use it in their analysis to predict clinically and functionally meaningful outcomes.
As expected, the gender score metric we created was able to separate man-like from woman-like patients based on gender-based division of labour and gender-based violence indicators, which clearly differs from biological sex, contributing important explanatory power in understanding TBI outcomes.The distribution of the score towards woman-like characteristics in our study was opposite to results reported earlier in a cohort of younger persons with myocardial infraction 19 , where researchers found a more asymmetrical distribution with a stronger clustering of male persons in the man-like characteristics and a broader distribution of female persons over the whole gender score continuum.Our results, across adulthood ages suggest that that female patients might possess their woman-like characteristics more strongly in the fifth and sixth decades of life whereas male patients acquire a wider range of characteristics on the gender score continuum, although their man-like characteristics were more profoundly seen in younger ages.Future studies should consider derivation of gender scores in population based TBI research by the decades of life.
The significance of studying biological sex as a separate entity from gender-related characteristics in early mortality after TBI has been increasingly emphasized in preclinical 20 and clinical 21 research.It has been suggested that female hormones oestrogen is neuroprotective, acting on the steroidogenic central nervous system to attenuate neural damage post-injury, particularly in females, given the occurrence of the hormone at higher levels in females relative to males 4 .Several mechanisms 22,23 of action have been suggested for its neuroprotective capacity, including post-injury levels of brain-derived neurotrophic factor, given its role in the survival, differentiation, and outgrowth of neurons, and its purported regulation by oestrogen.As level of oestrogen changes over the lifetime of the female persons, with low points at the beginning and end of life, if either of these hormones is to afford protection following TBI 24 , it is conceivable that its influences would be most potent in adulthood ages we studied as opposed to early or later life, which remain to be explored in future research.
Different gender-related characteristics, including societal norms, roles, and responsibilities (i.e., genderbased division of labour), gender-based violence, and gender inequity in access to and control over resources have been reported as being important to the socially driven outcomes after TBI 3 .Prior research has shown that female patients are more likely than male patients to be discharged to care facilities versus home after TBI 25,26 , possibly due to differences in the existing familiar and social support.In our cohort of adults with TBI, we observed, in line with prior research and our hypotheses, that woman-like gender characteristics were a predictor of lower probability to be discharged to "rehabilitation" after acute care hospital stay, even after controlling for relevant variables.The gender score was shown to contain gender related characteristics such as "assault by spouse or partner" and "physical abuse, " among others turned to be stronger predictors of discharge location than biological sex.Considering an evolving society with closing gender inequity gaps in the household 27 and global efforts to eradicate gender-based violence 28 , further research is imperative to evaluate whether the effect of gender score on discharge locations would diminish with time.Furthermore, gender related characteristics among older persons who are women may be more impactful on discharge location than among younger persons.Future investigation into children, adolescents, and older persons' groups is needed, which may show different influence of genderrelated characteristics on discharge location after TBI.There are several limitations to this analysis.We used the same information related to TBI from ICD-10-CA codes to create a gender score metric and investigated its relationship with TBI outcomes.Gender is a multidimensional notion, and the metric we built only incorporates limited dimensions of gender, such as risk-taking behaviors, gender-based violence, and employment/occupations.However, we believe that defining gender score based on characteristics that predict a person to be more likely a male or a female is in keeping with the existing methods to measure gender.Also, the resulting gender score reflect degree of "man-like" more than of "womanlike", which is an important finding 22 .Further, our "sex" variable was binary.The relatively small number of persons in the dataset that did not identify as a male or a female did not make matched analyses feasible.We also recognize the limitations of using administrative data overall.
In conclusion, this study, to the best of our knowledge, is the first example of applying the concept of gender diagnosticity to the ICD-10-CA diagnostic codes data in a province-wide Canadian cohort of patients with TBI.When creating potentially time-dependent gender score and testing its association with outcomes of interest (i.e., excess mortality and discharge locations), we defined relatively restricted (from historical perspective) time windows for analysis.The derived gender score metric allowed us to gain additional insights into relationship between sex, gender, and TBI outcomes when no explicit measure of gender is available in a data source comprised of predominantly ICD-10-CA codes.Our results highlight that sex and gender effects expressed differently in TBI outcomes that are driven to a greater extent by physiological responses to injury in the context of genetics, endocrine, metabolic, and immune systems (i.e., sex) or by interpersonal family and community relationships, and socioeconomic factors within the person's living environment.More research is needed to further test and validate this approach in different age cohorts and across different clinical conditions, as well as gender metric variability over time.

Study design and data sources
For this retrospective cohort study, we accessed the population-wide health administrative data for all publicly funded services provided to the residents of Ontario, Canada from ICES (formerly the Institute for Clinical Evaluative Sciences) 18 data repository.We combined the records for ED visits with acute care visits, gathered from National Ambulatory Care Reporting System (NACRS) and Discharge Abstract Database (DAD), respectively.These datasets contained primary and secondary diagnoses recorded using ICD-10-CA codes (up to 10 codes per record in NACRS, and up to 25 codes in DAD) as well as clinical, demographic, and socio-economic information about each person.We only included the first incidence of a TBI-related visit during the study period, defined as the "first TBI event", for patients who were discharged from the ED or acute care hospitals with a TBI diagnostic code (S020, S021, S023, S027, S028, S029, S040, S071, S06) between April 1st, 2002, and March 31st, 2020 29 .We restricted the cohort to patients who were aged 16-64 years in order ensure homogeneity of gender attributes within the adult group (versus pediatric or senior population).Data on age, sex, and calendar year specific death rates in the general population were extracted from Statistics Canada life tables 30 .The Abbreviated Injury Severity Score generated according to previously published severity classifications, was used as a measure of TBI severity and was measured on a 6-point scale based on ICD-10-CA codes and categorized as mild (1-2), moderate (3), or severe (> 4) 31,32 .
Data on discharge locations were derived from DAD.The combined dataset was randomly split into 50% for training, 25% for validation, and 25% for testing to prevent overfitting and to ensure model validation.Training and validation sets were used for model building and internal validation, whereas the reported results were based on the test set performance.

Ethical approval and informed consent
Approval: The study protocol was approved by the Research Ethics Boards of the University of Toronto (20-5823)  and the University Health Network (#20-5823) All methods were carried out in accordance with the relevant guidelines and regulations.
Informed consent: This research utilised encrypted administrative health data authorised under Section 45 of Ontario's Personal Health Information Protection Act.The data are housed at Institute for Clinical Evaluative Sciences (ICES), an independent, non-profit research institute, whose legal status under Ontario's health information privacy law allowed it to collect and analyse healthcare and patient characteristics data, without individual patient consent, for health system evaluation and improvement.

Gender score derivation
We used logistic regression approach to derive gender score reflecting a probability of each person being male or female based on a set of indicator variables of diagnostic codes that reflect biological (associated with binary sex, such as diseases) and social (associated with behavioral and other socially defined characteristics considered as man-like or woman-like) attributes of people.Each person's sex was compiled from the Registered Persons Database 33 .The ICD-10-CA diagnostic codes recorded in each TBI visit were converted into a matrix of indicator variables for each distinct diagnostic code using natural language processing tools (creating document-term matrix using R package "tm" 34 ).Diagnostic codes that were not common, i.e., present in a single person in training and/or validation datasets, as well as codes that occurred only in males or only in females, were removed from both sets; the latter was done to ensure derived gender characteristics were relevant to both sexes.To select the subset of diagnostic codes to include in the gender score model, we assessed the significance of each unique diagnostic code in predicting the sex (Female = 1) of persons who were diagnosed with that code by fitting univariate logistic regression models.All diagnostic code indicators that were significant at 5% level after Benjamini-Hochberg correction 35,36 in both training and validation sets were subsequently included into the gender score model predicting the probability of sex reported as female in the training set.Consequently, model coefficients obtained from the training dataset were used to calculate the final gender scores in the test set.Therefore, the final gender score was a continuous variable ranging from 0 (man-like) to 1 (woman-like), estimating the probability of a person being male or female.

Predicting TBI-related excess mortality
Following our previous research 8 , we defined the acute phase of mortality due to injury sustained during a TBI event (in some studies it was called TBI-related mortality 8,37 ) as death within a 30-day window.Exploratory analysis showed that 64% of the people who died within 30 days had a severe TBI diagnosis (Supplementary Table 1), therefore, the analysis was restricted to this subpopulation.In addition, people with unknown survival status 30 days following their first TBI event, or with unknown injury severity were excluded from this analysis.
The outcome was therefore defined as time-to-death within 30 days of the first TBI event and patients who were alive on the 30th day after the first TBI event were censored.Covariates in the model were selected based on previous research 8 , which included age as a continuous variable, mechanism of injury (determined using major external cause of injury group codes 38 : falls, struck by/against object, motor vehicle collisions, cyclist collisions, other), rurality indicator, income quintile (linear predictor), and Johns Hopkins Aggregated Diagnosis Groups (ADG) score, which is a weighted score representing the presence or absence of 32 ADG diagnosis groups as an indicator of comorbidities 39 (Supplementary Table 2).To control for population death rates, we extracted the age, sex, and calendar year specific death rates for each person from the Statistics Canada life tables 18 and used it as an offset term in the model.Excess mortality rate was modelled using a Poisson regression model 8,[40][41][42] by treating survival status of each subject as observations from Poisson distribution with rate parameter specific for each time interval (day); the model is equivalent to the piecewise exponential survival model 42 .Since mortality status is recorded daily or in discrete time intervals, Poisson distribution naturally fits the data structure and model allows for population-wide death rate to be accounted for in the model.
As part of data pre-processing, the mortality dataset was transformed into a person/period format, with 1 record per day until death or censoring occurred at day 30 with an event indicator equal to 0 for each day the person was alive and 1 for the day of death.To control for the underlying population death rate, daily death rate (dividing yearly death rate by 365) was calculated from Statistics Canada life tables 30 for each patient, matched by sex, age, and year of the first traumatic brain injury (TBI)-related healthcare visit during our study period and was added to the model as an offset term.The resulting model was defined as following (Eq.1): where i is the death rate for day t i from the first TBI event, age,sex,year is the average daily death rate derived from Statistics Canada 30 mortality tables, matched by patient age, sex and year of the first TBI event, and X is the matrix of predictors.For patients who died on day 0, an interval of length 0.5 was assigned, and population rate included into the model was adjusted accordingly.

Predicting discharge location
Discharge location prediction was restricted to acute care visits.Patients who were alive when discharged, with a recorded discharge location and non-missing baseline ADG score were included in this analysis (Table 1).The outcome variable was discharge location from acute care, categorized into six groups: discharged home, discharged home with support, inpatient complex continuing care (CCC), long term care (LTC), rehab, and other.The category "other" was composed of smaller subgroups including transfer to another inpatient care/hospital/ acute care facility, long term/continued care, other ambulatory care/palliative care/hospice/addiction treatment centers/jails, died in facility, left against medical advice, and signed out against medical advice 25,43 .Covariates identified from previous study included age, length of stay (LOS), ADG score, rurality indicator, and income quintiles 25 .The most common discharge location (Supplementary Table 3) was "discharged home", which was used as the reference level in the baseline category logistic regression models 44 .

Measuring effects of sex versus gender score
We compared the predictive performances of gender score versus biological sex in predicting TBI-related outcomes (early mortality and discharge location) using the test set.To achieve this, we considered the following three models for each outcome: Model 1 with binary sex and control variables as covariates, Model 2 with gender score and control variables, and Model 3 with both sex and gender score in addition to the control variables.We used two metrics/statistics to assess the effects of sex and gender score in predicting TBI-related outcomes: (1) profile likelihood based confidence intervals (CI) and ( 2 www.nature.com/scientificreports/any relevant variables.We reported the p-value (p-val) and the degrees of freedom (df) for LRT, p-val < 0.05 was considered statistically significant.All model-derived estimates are reported with 95% CI, and if the CI contains 1, it is not significant at 5% level.Effect estimates were reported to compare the unit difference in gender score and sex.Sex is a binary variable coded as Female = 1 and Male = 0, and gender score is a continuous variable ranging from 0 to 1 (towards 0 is more man-like and towards 1 is more woman-like).
To assess whether sex or gender score is more informative in predicting TBI-related outcomes, we can use Model 3 to assess whether Model 1 or Model 2 is preferred using an indirect and a direct approach for comparison.An indirect approach is to test the significance of the sex effect and the gender effect, if one effect is significant and the other effect is not significant, then that is an evidence that the model with just the significant effect (and the control variables) is the preferred model 15 for that outcome.We can also use Bayes factors (BF) 16 to directly measure the strength of evidence that the data supports one model versus the other, therefore, prefers one effect over the other.More detailed description of the two approaches is presented below 45 .

Statistical methods: comparing different models
An important question in this paper is to decide if a TBI-related outcome is more likely due to sex or gender.From a statistical testing point of view, this translates to a question of whether the evidence from the data better supports a model with sex effect only or a model with gender effect only.Normally, we would test this using nested models where one model contains a subset of predictors of the other model and test if there is a significant difference between the fitness of the two models.However, in our case we have two models that are not nested, invalidating the direct procedure for comparison.Instead, we can assess the effects of sex and gender by looking at a full model which contains both variables and check whether one effect is statistically significant while the other effect is not significant.Cox discussed such testing method to compare a model with only effect 1 versus a model with only effect 2, one can do this by putting them both in a full model which contains both effects 15 .According to Cox, "If [effect 1] is very highly significant whereas [effect 2] is not [in the full model], a clear conclusion can be reached that the data agree better with [the model with effect 1] than the [model with effect 2]" 15 .This provides a basic procedure, but it does not directly measure the weight of evidence that one model is preferred over the other.
Theoretically, a way to directly compare the two model is to calculate the Bayes Factor (BF) 16 .This provides us with an estimate of the weight of evidence from the data that it supports one model over the other.We can approximate BF using the Bayesian Information Criterion (BIC) 45 .As mentioned in the main text we have the following three models: Model 1 contains sex effect only, Model 2 contains gender effect only, and Model 3 contains both sex and gender effects, in addition to the same control variables for all three models.Let LL i be the loglikelihood ( LL ) for the i th model.Then, the difference in BIC ( BIC 12 ) comparing the sex only model (Model 1) to the gender only model (Model 2) is equal to (Eq.2): where d i is the degrees of freedom for model i and n is the sample size.Finally, BF 12 comparing the weight of evidence that supports Model 1 over Model 2 can be approximated by (Eq.3): Note, if one wants to know the weight of evidence for Model 2 over Model 1, then BF 21 can be calculated as In this paper, we are comparing the sex effect only model versus the gender effect only model.(That is, where both models include the same set of control variables.)Since we are only comparing two models and both models have the same degrees of freedom, then BIC 12 turns out to be the difference in the likelihood ratio test (LRT) statistics used to test for the significance of the sex effect and the gender effect in the full model.To see this, note that the LRT S to test for the sex effect in the full model compares the LL of the model WITHOUT sex to the LL of the full model.The model without sex is the model with gender only which is Model 2. So, the LRT S is equal to the following (Eq.4): The LRT for the gender effect is similar except it uses LL 1 instead of LL 2 .Since both Model 1 and Model 2 have the same degrees of freedom, it can be shown that (Eq.5): which implies that (Eq.6):

1BF 12 .
Kass and Raftery suggest the following criteria to interpret BF16 :

Table 2 .
ICD-10-CA codes with the highest effects (OR and 95% CI) in predicting male vs. female in the training set.ICD-10-CA: International Statistical Classification of Diseases and Related Health Problems, Tenth Revision, Canada, OR: odds ratio, CI: confidence interval.OR estimates are not used for further inference, hence, only point estimates are presented.

Table 3 .
ICD -10-CA codes with the highest effects (OR and 95% CI) in predicting female vs. male in the training set.ICD-10-CA: International Statistical Classification of Diseases and Related Health Problems, Tenth Revision, Canada, OR: odds ratio, CI: confidence interval.OR estimates are not used for further inference, hence, only point estimates are presented.

Table 5 .
Models predicting discharge location post-TBI hospitalization.OR: odds ratio, CI: confidence interval; LRT: Likelihood-Ratio Test; df: degrees of freedom; LTC: Long Term Care; CCC: Inpatient Complex Continuing Care.Baseline category logit model using binary sex and continuous gender score (1 is more woman-like vs. 0 is more man-like) for test set with N = 7343.